34 research outputs found

    Computational identification of transcriptional regulatory elements in DNA sequence

    Get PDF
    Identification and annotation of all the functional elements in the genome, including genes and the regulatory sequences, is a fundamental challenge in genomics and computational biology. Since regulatory elements are frequently short and variable, their identification and discovery using computational algorithms is difficult. However, significant advances have been made in the computational methods for modeling and detection of DNA regulatory elements. The availability of complete genome sequence from multiple organisms, as well as mRNA profiling and high-throughput experimental methods for mapping protein-binding sites in DNA, have contributed to the development of methods that utilize these auxiliary data to inform the detection of transcriptional regulatory elements. Progress is also being made in the identification of cis-regulatory modules and higher order structures of the regulatory sequences, which is essential to the understanding of transcription regulation in the metazoan genomes. This article reviews the computational approaches for modeling and identification of genomic regulatory elements, with an emphasis on the recent developments, and current challenges

    Cis-regulatory variations: A study of SNPs around genes showing cis-linkage in segregating mouse populations

    Get PDF
    BACKGROUND: Changes in gene expression are known to be responsible for phenotypic variation and susceptibility to diseases. Identification and annotation of the genomic sequence variants that cause gene expression changes is therefore likely to lead to a better understanding of the cause of disease at the molecular level. In this study we investigate the pattern of single nucleotide polymorphisms (SNPs) in genes for which the mRNA levels show cis-genetic linkage (gene expression quantitative trait loci mapping in cis, or cis-eQTLs) in segregating mouse populations. Such genes are expected to have polymorphisms near their physical location (cis-variations) that affect their mRNA levels by altering one or more of the cis-regulatory elements. This led us to characterize the SNPs in promoter (5 Kb upstream) and non-coding gene regions (introns and 5 Kb downstream) (cis-SNPs) and the effects they may have on putative transcription factor binding sites. RESULTS: We demonstrate that the cis-eQTL genes (CEGs) have a significantly higher frequency of cis-SNPs compared to non-CEGs (when both sets are taken from the non-IBD regions, i.e. regions not identical by descent). Most CEGs having cis-SNPs do not contain these SNPs in the phylogenetically conserved regions. In those CEGs that contain cis-SNPs in the phylogenetically conserved regions, enrichment of cis-SNPs occurs both within and outside of the conserved sequences. A higher fraction of CEGs are also seen to harbor cis-SNP that affect predicted transcription factor binding sites, a likely consequence of the higher cis-SNPs density in these genes. CONCLUSION: This present study provides the first genome-wide investigation of the putative cis-regulatory variations in a large set of genes whose levels of expression give rise to cis-linkage in segregating mammalian populations. Our results provide insights into the challenges that exist in identifying polymorphisms regulating gene expression using bioinformatic sequence analysis approaches. The data provided herein should benefit future investigations in this area

    Exon and junction microarrays detect widespread mouse strain- and sex-bias expression differences

    Get PDF
    Background: Studies have shown that genetic and sex differences strongly influence gene expression in mice. Given the diversity and complexity of transcripts produced by alternative splicing, we sought to use microarrays to establish the extent of variation found in mouse strains and genders. Here, we surveyed the effect of strain and sex on liver gene and exon expression using male and female mice from three different inbred strains. Results: 71 liver RNA samples from three mouse strains - DBA/2J, C57BL/6J and C3H/HeJ - were profiled using a custom-designed microarray monitoring exon and exon-junction expression of 1,020 genes representing 9,406 exons. Gene expression was calculated via two different methods, using the 3'-most exon probe ("3' gene expression profiling") and using all probes associated with the gene ("whole-transcript gene expression profiling"), while exon expression was determined using exon probes and flanking junction probes that spanned across the neighboring exons ("exon expression profiling"). Widespread strain and sex influences were detected using a two-way Analysis of Variance (ANOVA) regardless of the profiling method used. However, over 90% of the genes identified in 3' gene expression profiling or whole transcript profiling were identified in exon profiling, along with 75% and 38% more genes, respectively, showing evidence of differential isoform expression. Overall, 55% and 32% of genes, respectively, exhibited strain- and sex-bias differential gene or exon expression. Conclusion: Exon expression profiling identifies significantly more variation than both 3' gene expression profiling and whole-transcript gene expression profiling. A large percentage of genes that are not differentially expressed at the gene level demonstrate exon expression variation suggesting an influence of strain and sex on alternative splicing and a need to profile expression changes at sub-gene resolution

    A comprehensive transcript index of the human genome generated using microarrays and computational approaches

    Get PDF
    BACKGROUND: Computational and microarray-based experimental approaches were used to generate a comprehensive transcript index for the human genome. Oligonucleotide probes designed from approximately 50,000 known and predicted transcript sequences from the human genome were used to survey transcription from a diverse set of 60 tissues and cell lines using ink-jet microarrays. Further, expression activity over at least six conditions was more generally assessed using genomic tiling arrays consisting of probes tiled through a repeat-masked version of the genomic sequence making up chromosomes 20 and 22. RESULTS: The combination of microarray data with extensive genome annotations resulted in a set of 28,456 experimentally supported transcripts. This set of high-confidence transcripts represents the first experimentally driven annotation of the human genome. In addition, the results from genomic tiling suggest that a large amount of transcription exists outside of annotated regions of the genome and serves as an example of how this activity could be measured on a genome-wide scale. CONCLUSIONS: These data represent one of the most comprehensive assessments of transcriptional activity in the human genome and provide an atlas of human gene expression over a unique set of gene predictions. Before the annotation of the human genome is considered complete, however, the previously unannotated transcriptional activity throughout the genome must be fully characterized

    Novel transcription regulatory elements in Caenorhabditis elegans muscle genes

    Get PDF
    We report the identification of three new transcription regulatory elements that are associated with muscle gene expression in the nematode Caenorhabditis elegans. Starting from a subset of well-characterized nematode muscle genes, we identified conserved DNA motifs in the promoter regions using computational DNA pattern-recognition algorithms. These were considered to be putative muscle transcription regulatory motifs. Using the green-fluorescent protein (GFP) as a reporter, experiments were done to determine the biological activity of these motifs in driving muscle gene expression. Prediction accuracy of muscle expression based on the presence of these three motifs was encouraging; nine of 10 previously uncharacterized genes that were predicted to have muscle expression were shown to be expressed either specifically or selectively in the muscle tissues, whereas only one of the nine that scored low for these motifs expressed in muscle. Knockouts of putative regulatory elements in the promoter of the mlc-2 and unc-89 genes show that they significantly contribute to muscle expression and act in a synergistic manner. We find that these DNA motifs are also present in the muscle promoters of C. briggsae, indicating that they are functionally conserved in the nematodes

    Conserved homeodomain proteins interact with MADS box protein Mcm1 to restrict ECB-dependent transcription to the M/G1 phase of the cell cycle

    No full text
    Two homeodomain proteins, Yox1 and Yhp1, act as repressors at early cell cycle boxes (ECBs) to restrict their activity to the M/G1 phase of the cell cycle in budding yeast. These proteins bind to Mcm1 and to a typical homeodomain binding site. The expression of Yox1 is periodic and directly correlated with its binding to, and repression of, ECB activity. The absence of Yox1 and Yhp1 or the constitutive expression of Yox1 leads to the loss of cell-cycle regulation of ECB activity. Therefore, the cell-cycle-regulated expression of these repressors defines the interval of ECB-dependent transcription. Twenty-eight genes, including MCM2-7, CDC6, SWI4, CLN3, and a number of genes required during late M phase have been identified that are coordinately regulated by this pathway

    Coevolution of protein and RNA structures within a highly conserved ribosomal domain

    Get PDF
    SummaryThe X-ray crystal structure of a ribosomal L11-rRNA complex with chloroplast-like mutations in both protein and rRNA is presented. The global structure is almost identical to that of the wild-type (bacterial) complex, with only a small movement of the protein α helix away from the surface of the RNA required to accommodate the altered protein residue. In contrast, the specific hydrogen bonding pattern of the mutated residues is substantially different, and now includes a direct interaction between the protein side chain and an RNA base edge and a water-mediated contact. Comparison of the two structures allows the observations of sequence variation and relative affinities of wild-type and mutant complexes to be clearly rationalized, but reinforces the concept that there is no single simple code for protein-RNA recognition
    corecore